February 16, 2021

Gene expression

I start the work with the data by finding the differentially expressed genes.

To do so, I perform the tests for comparison of the means of control and disease groups, starting with simple two sample for t-test. For each gene I also check the variances equality before comparing the groups’ means.

P-values from t-test: before and after correction

Gene expression - corrected test scheme

The number of differentiated genes proved to be really high, therefore I check whether the assumption on the normality of the distribution does not hinder the results by applying new test scheme:

  • Check the normality of distribution of both groups.
  • If it is normal, perform t-test, checking the equality of variances beforehand.
  • If it is not a normal distribution, perform Mann-Whitney test.
  • Correct obtained p-values with Benjamini & Hochberg method for multiple testing.

P-values after distribution consideration

Distribution effect

I compare the result of taking the distribution into consideration with the previous assumption.

Enrichment analysis

After getting gene differentiation, I proceed with enrichment analysis. I will start with ORA, then proceed into FCS methods.

ORA

##                                                 Title corrected_pvals
## 100                                     RNA transport    4.259285e-08
## 133                                        Cell cycle    4.259285e-08
## 305                               MicroRNAs in cancer    1.182302e-07
## 260                                 Alzheimer disease    1.960257e-06
## 265                                     Prion disease    1.960257e-06
## 295           Human T-cell leukemia virus 1 infection    1.960257e-06
## 300                                Pathways in cancer    2.529952e-06
## 88                                 Metabolic pathways    4.283633e-06
## 263                                Huntington disease    4.283633e-06
## 266 Pathways of neurodegeneration - multiple diseases    4.546480e-06
## 115                            Fanconi anemia pathway    6.612935e-06
## 170                                    Focal adhesion    6.612935e-06
## 304                           Proteoglycans in cancer    6.612935e-06
## 262                     Amyotrophic lateral sclerosis    7.685330e-06
## 101                         mRNA surveillance pathway    2.554206e-05

CERNO

##                                       Title corrected_pvals
## 88                       Metabolic pathways    2.368986e-08
## 133                              Cell cycle    2.368986e-08
## 300                      Pathways in cancer    2.368986e-08
## 295 Human T-cell leukemia virus 1 infection    4.941884e-08
## 170                          Focal adhesion    1.231707e-07
## 305                     MicroRNAs in cancer    5.131701e-07
## 159      Vascular smooth muscle contraction    5.800451e-07
## 116                  MAPK signaling pathway    2.139096e-06
## 119                  Rap1 signaling pathway    2.139096e-06
## 125             Chemokine signaling pathway    4.895844e-06
## 304                 Proteoglycans in cancer    4.895844e-06
## 148              PI3K-Akt signaling pathway    5.527751e-06
## 144                             Endocytosis    5.931361e-06
## 177     Complement and coagulation cascades    5.931361e-06
## 121              cGMP-PKG signaling pathway    7.034663e-06

Z-transform

##                                                 Title corrected_pvals
## 88                                 Metabolic pathways    9.001854e-28
## 300                                Pathways in cancer    1.231452e-22
## 266 Pathways of neurodegeneration - multiple diseases    9.693740e-16
## 295           Human T-cell leukemia virus 1 infection    1.234043e-15
## 148                        PI3K-Akt signaling pathway    2.047843e-14
## 116                            MAPK signaling pathway    6.369432e-14
## 133                                        Cell cycle    6.369432e-14
## 170                                    Focal adhesion    6.369432e-14
## 260                                 Alzheimer disease    2.115722e-13
## 305                               MicroRNAs in cancer    3.002434e-13
## 304                           Proteoglycans in cancer    4.702153e-13
## 265                                     Prion disease    3.590622e-12
## 263                                Huntington disease    4.907427e-12
## 119                            Rap1 signaling pathway    6.906820e-12
## 294                    Human papillomavirus infection    5.373930e-11

GSEA implementation

Signal to noise absolute - p-values

We can see the matlab output is definitely strange.

Signal to noise absolute - ES

## P-value:  4.997284e-42
## Correlation coefficient:  -0.6518573

Signal to noise - p-values

Signal to noise - ES

## P-value:  0.261582
## Correlation coefficient:  0.06141717

LFC absolute - p-values

LFC absolute - ES

## P-value:  2.1936e-26
## Correlation coefficient:  -0.5360192

LFC - p-values

LFC - ES

## P-value:  0.007659464
## Correlation coefficient:  0.1452507

PLAGE

##                                                    Title corrected_pvals
## 56                        Glycerophospholipid metabolism    9.275123e-53
## 171                             ECM-receptor interaction    9.405913e-53
## 238                                   Insulin resistance    1.147170e-52
## 233                            Relaxin signaling pathway    8.115912e-52
## 240 AGE-RAGE signaling pathway in diabetic complications    9.085146e-52
## 88                                    Metabolic pathways    5.763372e-51
## 148                           PI3K-Akt signaling pathway    6.818963e-51
## 27                       Arginine and proline metabolism    1.392011e-50
## 254                     Protein digestion and absorption    1.478016e-50
## 305                                  MicroRNAs in cancer    1.478016e-50
## 235  Parathyroid hormone synthesis, secretion and action    1.543308e-50
## 173                                    Adherens junction    2.323504e-50
## 273                            Vibrio cholerae infection    2.323504e-50
## 144                                          Endocytosis    3.092973e-50
## 166                             Apelin signaling pathway    3.250911e-50

GSVA

##                                        Title corrected_pvals
## 159       Vascular smooth muscle contraction    1.514201e-41
## 178                      Platelet activation    3.632513e-41
## 228               Oxytocin signaling pathway    4.745869e-38
## 231                          Renin secretion    2.135954e-37
## 166                 Apelin signaling pathway    2.934893e-37
## 131        Phospholipase D signaling pathway    4.966612e-35
## 121               cGMP-PKG signaling pathway    5.562491e-35
## 119                   Rap1 signaling pathway    1.047192e-34
## 109                   PPAR signaling pathway    1.069441e-33
## 230    Regulation of lipolysis in adipocytes    2.691752e-32
## 118                    Ras signaling pathway    1.665631e-31
## 185 C-type lectin receptor signaling pathway    1.665631e-31
## 116                   MAPK signaling pathway    2.880683e-31
## 211                     Long-term depression    1.213920e-30
## 204           Neurotrophin signaling pathway    2.380521e-30

log P-values correlation

Results comparison

##                    ORA CERNO   Z PLAGE LFC LFC_abs S2N S2N_abs GSVA
## enriched gene sets 114   200 263   338   0       0   1       1  267

Absolute signal to noise: MicroRNAs in cancer
Signal to noise: cGMP-PKG signaling pathway

Joint gene sets (besides GSEA)

## Number of joint enriched gene sets:  101

Combining p-values

##                                                    Title pval_combined
## 88                            cGMP-PKG signaling pathway  0.000000e+00
## 253                                  MicroRNAs in cancer  0.000000e+00
## 249                                   Pathways in cancer  1.584990e-55
## 121                   Vascular smooth muscle contraction  1.472594e-50
## 132                                       Focal adhesion  2.176197e-49
## 86                                Rap1 signaling pathway  3.639168e-48
## 83                                MAPK signaling pathway  7.100317e-48
## 252                              Proteoglycans in cancer  3.974759e-47
## 140                                  Platelet activation  1.954365e-46
## 99                                            Cell cycle  3.033170e-45
## 194 AGE-RAGE signaling pathway in diabetic complications  1.031549e-43
## 245              Human T-cell leukemia virus 1 infection  1.504357e-42
## 108                                          Endocytosis  1.713326e-42
## 136                                       Tight junction  2.279999e-42
## 85                                 Ras signaling pathway  1.532563e-41

Visualizations

cGMP-PKG signaling pathway

cGMP-PKG signaling pathway

MicroRNAs in cancer

MicroRNAs in cancer

Pathways in cancer

Pathways in cancer

Vascular smooth muscle contraction

Vascular smooth muscle contraction

Focal adhesion

Focal adhesion

Rap1 signaling pathway

Rap1 signaling pathway

MAPK signaling pathway

MAPK signaling pathway

Proteoglycans in cancer

Proteoglycans in cancer

Platelet activation

Platelet activation

Cell cycle

Cell cycle